{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 池化\n",
    "\n",
    ":label:`sec_pooling`\n",
    "\n",
    "\n",
    "\n",
    "通常，当我们处理图像时，我们希望逐渐\n",
    "降低隐藏表示的空间分辨率，\n",
    "聚合信息以便\n",
    "我们在网络中的地位越高，\n",
    "感受野越大（在输入端）\n",
    "每个隐藏节点都对其敏感。\n",
    "\n",
    "通常，我们的最终任务会问一些关于形象的全球性问题，\n",
    "e.g.，*里面有猫吗*\n",
    "所以最后一层的节点通常是敏感的\n",
    "到整个输入。\n",
    "通过逐渐聚合信息，生成越来越粗糙的地图，\n",
    "我们最终实现了学习全球代表性的目标，\n",
    "同时在处理的中间层保留卷积层的所有优点。\n",
    "\n",
    "\n",
    "此外，在检测较低级别的特征时，例如边缘\n",
    "（如 :numref:`sec_conv_layer`）中所述，\n",
    "我们经常希望我们的表达在某种程度上对翻译保持不变。\n",
    "例如，如果我们拍摄图像 `X`\n",
    "黑与白之间有着清晰的界限\n",
    "将整个图像向右移动一个像素，\n",
    "例如，`Z[i, j] = X[i, j+1]` ，\n",
    "那么新图像 `Z` 的输出可能会有很大的不同。\n",
    "边缘将移动一个像素，并随之进行所有激活。\n",
    "实际上，物体几乎不会完全出现在同一个地方。\n",
    "事实上，即使有三脚架和固定物体，\n",
    "快门移动导致相机振动\n",
    "可能会让所有东西都偏移一个像素左右\n",
    "（高端摄像头配备了特殊功能来解决这个问题）。\n",
    "\n",
    "本节介绍池层，\n",
    "有两个目的\n",
    "降低卷积层对位置的敏感性\n",
    "以及空间下采样表示。\n",
    "\n",
    "## 最大池和平均池\n",
    "\n",
    "像卷积层，池操作符\n",
    "由一个固定形状的滑动窗口组成\n",
    "根据步幅输入的所有区域，\n",
    "为每个经过的位置计算单个输出\n",
    "通过固定形状窗口（有时称为*池窗口*）。\n",
    "然而，与互相关计算不同\n",
    "卷积层中的输入和内核，\n",
    "池层不包含任何参数（没有*过滤器*）。\n",
    "相反，池运算符是确定性的，\n",
    "通常计算最大值或平均值\n",
    "池窗口中的元素的。\n",
    "这些操作称为*最大池*（*max pooling*简称）\n",
    "和*平均池*。\n",
    "\n",
    "在这两种情况下，与互相关算子一样，\n",
    "我们可以考虑池窗口\n",
    "从输入数组的左上角开始\n",
    "从左到右，从上到下滑动输入数组。\n",
    "在池窗口命中的每个位置，\n",
    "它计算最大值或平均值\n",
    "窗口中输入子数组的值\n",
    "（取决于使用的是*最大*还是*平均*池）。\n",
    "\n",
    "\n",
    "![Maximum pooling with a pooling window shape of $2\\times 2$. The shaded portions represent the first output element and the input element used for its computation: $\\max(0, 1, 3, 4)=4$](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/pooling.svg)\n",
    "\n",
    ":label:`fig_pooling`\n",
    "\n",
    "\n",
    "上面的:numref:`fig_pooling`中的输出数组的高度为2，宽度为2。\n",
    "这四个元素来自 $\\text{max}$的最大值：\n",
    "\n",
    "$$\n",
    "\\max(0, 1, 3, 4)=4,\\\\\n",
    "\\max(1, 2, 4, 5)=5,\\\\\n",
    "\\max(3, 4, 6, 7)=7,\\\\\n",
    "\\max(4, 5, 7, 8)=8.\\\\\n",
    "$$\n",
    "\n",
    "池窗口形状为 $p \\times q$ 的池层\n",
    "被称为$p \\times q$ 池层。\n",
    "池操作称为 $p \\times q$ 池。\n",
    "\n",
    "让我们回到物体边缘检测的例子\n",
    "在本节开头提到。\n",
    "现在我们将使用卷积层的输出\n",
    "作为 $2\\times 2$ 最大池的输入。\n",
    "将卷积层输入设置为 `X` ，池层输出设置为 `Y`。无论 `X[i, j]` 和 `X[i, j+1]` 的值是否不同，\n",
    "或者 `X[i, j+1]` 和 `X[i, j+2]` 是不同的，\n",
    "池层输出都包括 `Y[i, j]=1`。\n",
    "也就是说，使用 $2\\times 2$ 最大池层，\n",
    "我们仍然可以检测出卷积层是否识别出模式\n",
    "在高度和宽度上最多移动一个元素。\n",
    "\n",
    "在下面的代码中，我们实现了正向计算\n",
    " `pool2d` 函数中池层的。\n",
    "此函数类似于 `corr2d` 函数\n",
    "in:numref:`sec_conv_layer`。\n",
    "然而，在这里，我们没有计算输出的内核\n",
    "作为输入中每个区域的最大值或平均值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load ../utils/djl-imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "3"
    }
   },
   "outputs": [],
   "source": [
    "NDManager manager = NDManager.newBaseManager();\n",
    "\n",
    "public NDArray pool2d(NDArray X, Shape poolShape, String mode){\n",
    "    \n",
    "    long poolHeight = poolShape.get(0);\n",
    "    long poolWidth = poolShape.get(1);\n",
    "    \n",
    "    NDArray Y = manager.zeros(new Shape(X.getShape().get(0) - poolHeight + 1, \n",
    "                                        X.getShape().get(1) - poolWidth + 1));\n",
    "    for(int i=0; i < Y.getShape().get(0); i++){\n",
    "        for(int j=0; j < Y.getShape().get(1); j++){\n",
    "            \n",
    "            if(\"max\".equals(mode)){\n",
    "                Y.set(new NDIndex(i+\",\"+j), \n",
    "                            X.get(new NDIndex(i + \":\" + (i + poolHeight) + \", \" + j + \":\" + (j + poolWidth))).max());\n",
    "            }\n",
    "            else if(\"avg\".equals(mode)){\n",
    "                Y.set(new NDIndex(i+\",\"+j),\n",
    "                            X.get(new NDIndex(i + \":\" + (i + poolHeight) + \", \" + j + \":\" + (j + poolWidth))).mean());\n",
    "            }\n",
    "            \n",
    "        }\n",
    "    }\n",
    "    \n",
    "    return Y;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以在上图中构造输入数组 `X` ，以验证二维最大池层的输出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "4"
    }
   },
   "outputs": [],
   "source": [
    "NDArray X = manager.arange(9f).reshape(3,3);\n",
    "pool2d(X, new Shape(2,2), \"max\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同时，我们对平均池层进行了实验。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "14"
    }
   },
   "outputs": [],
   "source": [
    "pool2d(X, new Shape(2,2), \"avg\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 填充和跨步\n",
    "\n",
    "与卷积层一样，池层\n",
    "还可以更改输出形状。\n",
    "和以前一样，我们可以改变操作以获得所需的输出形状\n",
    "通过填充输入和调整步幅。\n",
    "我们可以演示填充和跨步的使用\n",
    "通过二维最大池层 `maxPool2dBlock`在池层中\n",
    "在DJL的 `Pool` 模块中发货。\n",
    "我们首先构造一个形状为 `(1, 1, 4, 4)`的输入数据，\n",
    "其中前两个维度是批次和通道。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "15"
    }
   },
   "outputs": [],
   "source": [
    "X = manager.arange(16f).reshape(1, 1, 4, 4);\n",
    "X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面，我们使用一个形状为 `(3, 3)` 的池窗口，\n",
    "步幅形状为 `(3, 3)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "16"
    }
   },
   "outputs": [],
   "source": [
    "// 定义块指定内核和步幅\n",
    "Block block = Pool.maxPool2dBlock(new Shape(3, 3), new Shape(3, 3));\n",
    "block.initialize(manager, DataType.FLOAT32, new Shape(1,1,4,4));\n",
    "\n",
    "ParameterStore parameterStore = new ParameterStore(manager, false);\n",
    "// 因为池层中没有模型参数，所以我们不需要\n",
    "// 调用参数初始化函数\n",
    "block.forward(parameterStore, new NDList(X), true).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "步幅和填充可以手动指定。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "7"
    }
   },
   "outputs": [],
   "source": [
    "// 重新定义内核形状、跨步形状和垫块形状\n",
    "block = Pool.maxPool2dBlock(new Shape(3,3), new Shape(2,2), new Shape(1,1));\n",
    "// block forward 方法\n",
    "block.forward(parameterStore, new NDList(X), true).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当然，我们可以指定一个任意的矩形池窗口\n",
    "并分别为高度和宽度指定填充和跨步。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "8"
    }
   },
   "outputs": [],
   "source": [
    "// 重新定义内核形状、跨步形状和垫块形状\n",
    "block = Pool.maxPool2dBlock(new Shape(2,3), new Shape(2,3), new Shape(1,2));\n",
    "block.forward(parameterStore, new NDList(X), true).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 多通道\n",
    "\n",
    "在处理多通道输入数据时，\n",
    "池化层分别池化每个输入通道，\n",
    "而不是一个通道一个通道地添加每个通道的输入\n",
    "就像在卷积层中一样。\n",
    "这意味着池层的输出通道数\n",
    "与输入通道的数量相同。\n",
    "下面，我们将把数组 `X` 和 `X+1`连接起来\n",
    "在通道维度上，使用2个通道构造输入。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "9"
    }
   },
   "outputs": [],
   "source": [
    "X = X.concat(X.add(1), 1);\n",
    "X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正如我们所看到的，在合并后，输出通道的数量仍然是2。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "attributes": {
     "classes": [],
     "id": "",
     "n": "10"
    }
   },
   "outputs": [],
   "source": [
    "block = Pool.maxPool2dBlock(new Shape(3,3), new Shape(2,2), new Shape(1,1));\n",
    "block.forward(parameterStore, new NDList(X), true).singletonOrThrow();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "* 对于池窗口中的输入元素，最大池操作将最大值指定为输出，平均池操作将平均值指定为输出。\n",
    "* 池层的主要功能之一是减轻卷积层对位置的过度敏感性。\n",
    "* 我们可以为池层指定填充和跨步。\n",
    "* 最大池，再加上大于1的步幅，可以用来降低分辨率。\n",
    "* 池层的输出通道数与输入通道数相同。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "1. 你能把平均池作为卷积层的一个特例来实现吗？如果是这样，那就去做。\n",
    "1. 你能把最大池作为卷积层的特例来实现吗？如果是这样，那就去做。\n",
    "1. 池层的计算成本是多少？假设池层的输入大小为 $c\\times h\\times w$，池窗口的形状为 $p_h\\times p_w$ ，填充为 $(p_h, p_w)$ ，跨步为$(s_h, s_w)$ 。\n",
    "1. 为什么您希望最大池和平均池的工作方式有所不同？\n",
    "1. 我们需要一个单独的最小池层吗？你能换个运算吗？\n",
    "1. 是否可以考虑平均和最大池之间的另一种操作（提示：recall the softmax）？为什么不那么受欢迎？"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "14.0.2+12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
